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Abstract. The problem of the release of anonymized microdata is an important topic in the fields of 
statistical disclosure control (SDC) and privacy preserving data publishing (PPDP), and yet it remains 
sufficiently unsolved, fn these research fields, fc-anonymity has been widely studied as an anonymity 
notion for mainly deterministic anonymization algorithms, and some probabilistic relaxations have 
been developed. However, they are not sufficient due to their limitations, i.e., being weaker than the 
original fc-anonymity or requiring strong parametric assumptions. First we propose Pfc-anonymity, 
a new probabilistic fc-anonymity, and prove that Pfc-anonymity is a mathematical extension of fc- 
anonymity rather than a relaxation. Furthermore, Pfc-anonymity requires no parametric assumptions. 
This property has a significant meaning in the viewpoint that it enables us to compare privacy levels of 
probabilistic microdata release algorithms with deterministic ones. Second, we apply Pfc-anonymity to 
the post randomization method (PRAM), which is an SDC algorithm based on randomization. PRAM 
is proven to satisfy Pfc-anonymity in a controlled way, i.e, one can control PRAM’s parameter so that 
Pfc-anonymity is satisfied. On the other hand, PRAM is also known to satisfy e-differential privacy, a 
recent popular and strong privacy notion. This fact means that our results significantly enhance PRAM 
since it implies the satisfaction of both important notions: fc-anonymity and e-differential privacy. 
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1 Introduction 

Releasing microdata while preserving privacy has been widely studied in the fields of statistical disclosure 
control (SDC) and privacy preserving data publishing (PPDP). Microdata has significant value, especially 
for data analysts who wish to conduct various type of analyses involving the viewing of whole data and 
determining what type of analysis they should conduct. 

The most common privacy notion for microdata release is k-anonymity proposed by Samarati and Sweeney 
[18,20]. It means that “no one can narrow down a person’s record to fc records.” This semantics is quite 
simple and intuitive. Therefore, many studies have been conducted on fc-anonymity, and many relevant 
privacy notions such as ^-diversity [13], have also been proposed. Among these relevant studies, applying 
fc-anonymity to probabilistic algorithms is a significant research direction. Most fc-anonymization algorithms 
deterministically generalize or partition microdata. However, there are probabilistic SDC methods such as 
random swapping, random sampling, and post randomization method (PRAM) [9]. How are these proba¬ 
bilistic algorithms related to fc-anonymity? 

Regarding random swapping, for example, Soria-Comas and Domingo-Ferrer answered the above question 
by relaxing fc-anonymity to a probabilistic fc-anonymity, which means that “no one can correctly link a person 
to a record with a higher probability than 1/fc [19].” Intuitively, this semantics seems to be very close to 
that of the original fc-anonymity. However, its precise relation to fc-anonymity has not been argued, and we 
still cannot definitely say that an algorithm satisfying their probabilistic fc-anonymity also is fc-anonymous. 

PRAM was proposed by Kooiman et al. in 1997. It changes data into other random data according to 
the probability on a transition probability matrix. Agrawal et al. also developed privacy preserving OLAP 
(Online Analytical Processing) [3] by retention-replacement perturbation, which is an instantiation of PRAM. 
For many years, PRAM’s privacy was not clarified; however, PRAM has been recently proven to satisfy e- 
dijferential privacy (DP) [12]. 


Differential privacy [6] is another privacy notion that has attracted a great deal of attention recently. 
e-DP is the original version of DP and many other relevant notions have been developed, e.g., (e,(5)-DP, 
which is a relaxation of e-DP. 

1.1 Motivations 

After the proposal, e-DP has been widely researched and is now known to be very strong privacy notion. 
Thus, it is natural that the satisfaction of e-DP is important. However, especially in the PPDP field, k- 
anonymity is as important as e-DP, although it takes only re-identification into consideration and several 
papers showed the limitation of fc-anonymity [13,10]. This notion is very simple and intuitive; therefore, the 
enormous number of techniques has been invented, and as a result, fc-anonymity has already spread among 
the businesspeople, doctors, etc., who are conscious about privacy, not only among the researchers. From 
the viewpoint of practice, it is a great merit that people recognize and understand the notion. 

Therefore, merging the two notions while preserving their theoretical guarantees in a controlled way is 
desirable. However, /c-anonymity applies only to deterministic anonymization algorithms, and e-DP applies 
to randomized ones; thus, it has been hard to manage both of them at once until now. 

PRAM has several good features, and we believe that it is one of promising candidates for PPDP. The 
anonymization step in PRAM is performed by a record-wise fashion so anonymizing data in parallel is easy, 
and we can extend PRAM to a local perturbation, i.e., an individual anonymizes his/her data before sending 
them to the central server. In addition, PRAM does not needs generalization, so we can obtain anonymized 
data with fine granularity and perform a fine-grained analysis on them. Furthermore, it is known that PRAM 
can satisfy e-DP [12]. 

Although PRAM has these features and was proposed [9] before when the methods satisfying fc-anonymization 
[20] and satisfying e-DP [6] were proposed, it has been studied less than other approaches in the area of 
PPDP. Most popular methods for PPDP are evaluated in the context of /c-anonymity. However, PRAM is a 
probabilistic method, so it cannot be evaluated in the context of fc-anonymity. This means that no one can 
compare PRAM with other methods for PPDP in the same measure. 

From the above circumstances, our aim of the paper is twofold. First, we extend fc-anonymity for prob¬ 
abilistic methods (not only PRAM) for merging fc-anonymity and e-DP. Second, we evaluate how strongly 
PRAM preserves privacy in the context of /c-anonymity. 


1.2 Contributions 

Our contributions are the following two points. 

Extending k-anonymity for Probabilistic Methods We propose Pfc-anonymity, which has the following four 
advantages compared to current probabilistic /c-anonymity notions. 

1. It is formally defined and sufficient to prove that it is a rigorous extension of the original fc-anonymity. 
Specifically, we prove that fc-anonymity and P/c-anonymity are totally equivalent if an anonymization algo¬ 
rithm is deterministic, in other words, if the algorithm is in the extent of conventional fc-anonymization. We 
claim that one can consider a set of microdata anonymized using a probabilistic algorithm as fc-anonymous 
if it is Pfc-anonymous. 

2. Its semantics is “no one estimates which person the record came from with more than 1/k probability 
(regardless of the link’s actual correctness).” From the viewpoint that privacy breaches are not only derived 
from correct information, this semantics is stronger than the prevention of only correct links. 

3. P/c-anonymity never causes failure of anonymization. Some current probabilistic fc-anonymity notions 
are defined as “satisfaction of fc-anonymity with certain probability.” Unlike these notions, Pfc-anonymity 
always casts a definite level of re-identification hardness to the adversary while it is defined via the theory 
of probability. 

4. It is non-parametric; that is, no assumption on the distribution of raw microdata is necessary. Fur¬ 
thermore, it does not require any raw microdata to evaluate k. 
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Applying Pk-anonymity to PRAM Pfc-anonymity on PRAM is analyzed. The value of k is derived from 
parameters of PRAM with no parametric assumption. Furthermore, we propose an algorithm to satisfy both 
Pfc-anonymity (and e-DP) with any value of k (and e) is given. 


1.3 Related Work 

On Probabilistic fe-anonymity Notions There are many studies on /c-anonymity, and it has many 
supplemental privacy notions such as ^-diversity and t-closeness [10]. 

There have also been several studies that are relevant to the probability. 

Wong et al. proposed {a, fc)-anonymity [21]. Roughly speaking, (a, A:)-anonymity states that (the original) 
fc-anonymity is satisfied with probability a. Lodha and Thomas proposed (1 — /3, fc)-anonymity. This is a 
relaxation from /c-anonymity in a sample to that in a population. These two notions are essentially based on 
the original fc-anonymity and are relaxations that allow failures of anonymization in a certain probability. 
P/c-anonymity is fully probabilistically defined and never causes failure of anonymization. 

Aggarwal proposed a probabilistic fc-anonymity [1]. Their goal was the same as with Pfc-anonymity; 
however, it requires a parametric assumption that the distribution of raw microdata is a parallel translation 
of randomized microdata, and this seems to be rarely satisfied since a randomized distribution is generally 
flatter than the prior distribution. 

Soria-Comas and Domingo-Ferrer also proposed their probabilistic fc-anonymity [19]. They applied it to 
random swapping and micro-aggregation. The semantics of their anonymity is “no one can correctly link a 
person to a record with a higher probability than 1/fc” and Pfc-anonymity is stronger. Unfortunately, further 
comparison is difficult since we could not find a sufficiently formal version of the definition. 


On Privacy Measures Applicable to PRAM Aggarwal and Agrawal proposed a privacy measure based 
on conditional differential entropy [2]. This measure requires both raw and randomized data to be evaluated, 
unlike Pfc-anonymity. 

Agrawal et al. proposed {s,pi,p 2 ) Privacy Breach [3], which is based on probability and applicable to 
retention-replacement perturbation. In contrast to fc-anonymity, it does not take into account background 
knowledge concerning raw data, that is, concerning quasi-identifier attributes. 

Rebollo-Monedero et al. [16] proposed a t-closeness-like privacy criterion and a distortion criterion which 
are applicable to randomization, and showed that PRAM can meet these criteria. Their work was aimed at 
clarifying the privacy-distortion trade-off problem via information theory, in the area of attribute estimation. 
Therefore, they did not mention whether PRAM can satisfy a well known privacy notion such as fc-anonymity. 


On Microdata Release Algorithms Satisfying fc-anonymity and DP Li et al. proposed a method 
satisfying fc-anonymity and (e, (I)-DPS by combining random sampling and fc-anonymization [11]. Since {e, S)- 
DPS is based on (e, (I)-DP, PRAM’s e-DP is stronger. 

Soria-Comas and Domingo-Ferrer proposed methods for t-closeness and e-DP. However, a certain amount 
of the adversary’s knowledge is assumed. Additionally, it cannot be applied when the adversary has any 
knowledge about all attributes. On the other hand, PRAM guarantees e-DP regardless of the adversary’s 
knowledge. 


On Probabilistic Anonymization Algorithms Related to PRAM There have been several studies 
[14,17, 7,4] on local perturbation in which individuals anonymize their respective data before transferring it 
to some central server. 

Agrawal et al. proposed a FRAPP [4]. They use a specific transition probability matrix called MASK 
[17] and Cut and paste [8] to satisfy pi-to-p 2 privacy breach [7]. After that, Rastogi et al. [15] proposed the 
Q;/?-algorithm that improves utility. These methods are closely related to PRAM, but they do not consider 
whether PRAM can satisfy a well-known notion such as fc-anonymity. 
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1.4 Organization of Paper 

In Section 2, we discuss the notations used in the paper and preliminary definitions. In Section 3, we propose 
our probabilistic /c-anonymity, Pfc-anonymity. In Section 4, we apply P/c-anonymity to PRAM and give 
algorithms for PRAM to satisfy both e-DP and PA:-anonymity. In Section 5, we describe the experimental 
results regarding the utility of PRAM with parameters derived from the algorithms given in the previous 
sections. Finally, we state the conclusions of this paper in Section 6. 

2 Preliminaries 

2.1 Basic Settings 

We consider two scenarios of microdata release using randomization. One is the setting in which a database 
administrator randomizes microdata (Figure 1(a)). The other is that in which individuals randomize their 
own records (Figure 1(b)). The latter is better with respect to privacy. PRAM is not only applicable to the 
former but also applicable to the latter [3] in contrast, /c-anonymity can only be applied to the former. Thus, 
our Pfc-anonymity is applicable to both scenarios via PRAM. 

Since a person randomizes his/her data in the latter scenario, no one has all the raw microdata. Therefore, 
a person should be able to conduct appropriate randomization without another person’s record. Fortunately, 
we can show that PRAM’s parameter satisfying P/c-anonymity and DP can be determined using only the 
expected record count and metadata of attributes, as mentioned in Section 4. 




Fig. 1. Two Scenarios of Microdata Release using PRAM 


2.2 Notation 

We treat a table-formed database as both private and released data. Since the record count is revealed at 
the same time that the data are released in an ordinary microdata release, we assume that the record count 
is public and static in theory. Furthermore, we consider attributes as one bundled direct product attribute 
since it is sufficient for theoretical discussion. 

Basically, we use the following notations. 

~ T : the set of any private tables 

— r, T: a private table as an instance/random variable 

— T': the set of any released tables 

— a released table as an instance/random variable 
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— TZ,TZ': the sets of all records in a private/released table 

— V, V': the sets of any values in a private/released table 

— A: a transition probability matrix in PRAM 

— fx- the probability function of X where A is a random variable 

In the discussion of multi-attributes, we also use the following notations. 

— At A'', the sets of attributes in a private/released table 

— Va,V'^, where a € A and a' £ A!', the sets of values in a private/released table, i.e., V = Y\a^A'^a^ 

V = Wa'^A' where means the direct product. 

— Aa where a £ A: transition probability matrix of each attribute 

We consider a table t £T (or, t' G T') as a map from TlioV (or, TV to V')- More formally, we difine t (or, 
t') as follows. 

Definition 1. (tables) 

Let a record set TZ and a value set V be finite sets. Then, the following map r is called a table on {TZ, V). 

T-.n^v, 

When we discuss a multi-attribute table, V is represented as where an attribute set A is a finite 

set, each Va is also a finite set for any a £ A, and ]([ means the direct product. 

2.3 PRAM 

PRAM [9] was proposed by Kooiman et al. in 1997 as a privacy preserving method for microdata release. 
It changes data according to a transition probability matrix. A transition probability matrix consists of 
probabilities in which each value in a private table will be changed into other specihc (or the same) values. 
Au^v denotes the probability m G V is changed into v G V. For example, Amaie,female means “male —>■ female” 
is 25%. 

PRAM is a quite general method. Invariant PRAM [9], retention-replacement perturbation [3], etc., are 
known as instantiations of it. Specifically, retention-replacement perturbation is simple and convenient. 


Retention-replacement Perturbation In retention-replacement perturbation, individuals probabilisti¬ 
cally replace their data with random data using given retention probability p. First, data are retained with 
p, and if the data are not retained, they will be replaced with a uniformly random value chosen from the 
attribute domain. Note that even if data are not retained, there is still the possibility that the data will not 
be changed, because the data value is included in the attribute domain as well as other values. For example, 
for an attribute “sex,” when p = 0.5, “male” is retained with 1/2 probability, and with the remaining 1/2 
probability, it is replaced with a uniformly random value, namely, a value “female” and a value “male,” which 
is the same as the original, both with 1/2 x 1/2 = 1/4 probability. Eventually, the probability that “male” 
changes into “female” is 1/4, and the probability that it does not change is 3/4. The lower the retention 
probability, the higher privacy is preserved. On the contrary, the lower the probability, the lower utility. 
These probabilities form the following transition probability matrix. 

'0.75 0.25' 

0.25 0.75 


Generally, the transition probability matrix Aa of an attribute a is written as 


{Aa\ 


Pa + 


(1 - Pa) 


(1 - Pa) 

JVal 


I |Va| 

where for any v £ V and a £ A, Va £ Va is an element of v G 
probability corresponding to a. 


if Va = v'a 
otherwise 

V corresponding to a, and pa is the retention 
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2.4 fc-anonymity 

The fc-anonymity [18] [20] is a privacy notion that is applicable to table-formed databases and defined as 
“for all database records, there are at least k records whose values are the same,” in other words, “no one 
can narrow down a person’s record to less than k records.” 

Using the notations in Section 2.2, we represent the definition of /c-anonymity [20] as follows. 

Definition 2. {k-anonymity) 

For a positive integer k, a released table r' G T' is said to satisfy k-anonymity (or to he k-anonymous), 
if and only if it satisfies the following condition. 

For any r' G TV, there are k or more r' ’s sueh that r' G TV and T'{r') = Ffr'). 

A released table t' in the above definition represents all columns corresponding to quasi-identifier attributes 
of an anonymized table. 

However, the definition in [20] is problematic; i.e., there are some tables that satisfy fc-anonymity but do 
not achieve its aim. For example, a table generated by copying all a private table’s records k times satisfies 
fc-anonymity but it is obviously not safe. Therefore, we assume [7^] = \'R.'\ to strengthen the above definition 
in the discussion of /c-anonymity in this paper. 


2.5 Anonymization and Privacy Mechanisms 

We define anonymization and privacy mechanisms separately to discuss them formally. First we define 
anonymization. 

Definition 3. (anonymization) 

Let TZ, TV, V and V' be finite sets, T and T' be the sets of all 
and let tt be a map tt : TZ ^ TZ'. Then, for any t G T and t' 
anonymization with tt from r to t' if and only if they satisfy 

6(t) = r' o tt, (1) 

where the notation A —5- 3^ denotes the set of all maps from X to y for any set X and y. 

Anonymization 5 represents an anonymization algorithm such as perturbation, fc-anonymization, etc. A 
map TT represents an anonymous communication channel, the shuffling function, or another component which 
hides the order of records in r. In this paper, we adopt the uniformly random permutation as tt.^ 

Privacy mechanisms involve not only <5 but also tt, TZ, TZ', V and V, and random variables are brought 
to extend the above definitions to probabilistic ones. Random variables corresponding to r, r', tt, and 5 are 
denoted by T, T', TT, and A, respectively. We assume T, LI, and A are mutually independent as probabilistic 
events, while T' is dependent on the other three random variables. 

Definition 4. (privacy mechanisms) 

Let TZ, TZ', V, V, T, and T' be the same as Definition 3, and let T, T', LI, and A be random variables 
on T, T', TZ -5- TZ', and T —?■ (7?. —> V), respectively, such that T, LI, and A are mutually independent as 
probabilistic events, where the notation X ^ y denotes the set of all maps from X toy for any set X and y. 
Then, the 6-tuple (TZ, V, TZ', V, TT, A) is called a privacy mechanism from T to T' if and only if they satisfy 
the following equation. 

A{T) =T' on 

^ A map TV is essential for anonymization. For example, if the first record in the private table is to be the first record 
in the released table, identification is trivial. 


tables on (TZ,V) and {TZ',V'), respectively, 
G T', a map 6 : T -G (TZ ^ V') is called 
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2.6 Differential Privacy on PRAM 


Dwork proposed DP [6] in 2006. It results in “an (statistical) output not changing much even if a database 
is changed with respect to at most one person.” Since it can be satisfied regardless of adversaries, it is being 
widely studied. 

Differential privacy is defined with a real number parameter e. 

Definition 5. (e-DP) 

Let & he a set of databases and d be a non-negative integer. A privacy mechanism K. : ^ is a 

probabilistic algorithm, and e is a (small) positive real number. We say K, gives e-DP if, and only if for 
S C Range(/C) and any pair Di, D 2 of databases “differing at most by 1 element,” the following condition is 
satisfied. 


Pr[/C(Di) eS]< exp(£) Pr[/C(D 2 ) G S] (2) 

Note that what ’databases’ and “differing at most by 1 element” mean remains free to interpretation. 

Differential privacy is used as a privacy notion on interactive statistical databases as usual. However, 
PRAM is known to satisfy e-DP for the query “select * from r” in SQL manner [12]. The query obviously 
represents the release of microdata. We introduce the known result [12] and discuss e-DP on PRAM in 
addition to Pfc-anonymity. 

PRAM satisfies e-DP with the following parameters [12]. 


Theorem 1. For any PRAM mechanism A whose transition probability matrix is denoted by A, A gives 
e-DP with the following e. 


e = In max 


A 


ev A 

€V' 


u,v' 


v,v' 


The theorem has been already shown but it may not be rigorous and suit our notations. Therefore, we 
give another proof of Theorem 1 in Appendix A. We show a multi-attribute representation of Theorem 1 
below. Simply, e becomes the summation of each attribute’s e. 


Corollary 1. For any PRAM mechanism A whose transition probability matrices are Aa for each attribute 
a € A, A gives e-DP with the following e. 


£ = > In max 

^' u.veVa 
a<eA 



Regarding retention-replacement perturbation, £ is evaluated as follows. 

Corollary 2. For any retention-replacement perturbation A whose retention probabilities of each attribute 
are pa, A gives e-DP privacy with the following e. 


£ = ^In 
aeA 


l + (|Va|-l)pa 
Pa 


( 3 ) 


3 Pfc-anonymity 

As the name suggests, /c-anonymity represents anonymity among privacy notions. It is known that satisfying 
only anonymity is not enough to preserve privacy [13]; thus, further privacy notions that prevent attribute 
estimation were developed after fc-anonymity. However, this never means that “anonymity is unnecessary.” 
These stronger privacy notions rely on the assumption that fc-anonymity has already been satisfied. Therefore, 
the same as with deterministic microdata release, we consider anonymity as the first privacy requirement 
in randomization-based microdata release. Regarding randomization, however, anonymity has not yet been 
clarified. Obviously, this is a critical problem and should be solved as soon as possible. 
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3.1 Problem with fc-anonymity on Randomization 

We now explain what occurs when one applies /c-anonymity directly to a randomized table. Imagine that 
one randomizes all records’ quasi-identifiers uniformly randomly. Furthermore, suppose that the resulting 
table happens to have a record whose data are unique. The randomized table does not satisfy fc-anonymity 
because it has a unique record. However, an adversary cannot identify anyone’s record (without knowledge of 
sensitive attributes) since uniformly random values provide no information. In other words, the table should 
be considered as fully anonymous, although the table does not satisfy fc-anonymity. Therefore, we need a 
new definition of fc-anonymity applicable to randomization. 

3.2 Intuitive Requirement 

To apply fc-anonymity to randomization, we have to determine what kind of notion we should construct. 
Intuitively, “no one can choose the correct record of a person with probability 1/fc” is the likely choice. 
However, we have to take into account an adversary’s incorrect presumption. Regarding privacy, the problem 
is not only the leakage of correct information, but the creation of incorrect information about a person. Since 
a person does not wish to reveal correct private information, neither the person nor the administrator of 
the database can resolve the adversary’s misconception. Therefore, we require a stronger sentence, “no one 
estimates which person the record came from with more than 1/fc probability.” Note that this second sentence 
involves the hrst sentence (no one can choose...), because an adversary who correctly chooses the record of 
a person with probability 1/fc is able to estimate the record at confidence 1/fc. 

3.3 Background Knowledge of Adversary 

In the definition of fc-anonymity, there is no adversary, and this definition is described as a simple condition 
to be satisfied in a table. This is convenient for measuring fc-anonymity. At the same time, however, it makes 
the meaning of privacy unclear. 

Therefore, there is an adversary in our model of Pfc-anonymity. The probability of linkage is varied 
according to the background knowledge of the adversary. In the Pfc-anonymity model, an adversary’s back¬ 
ground knowledge is represented as a probabilistic function /t^ on the private table. Pfc-anonymity requires 
the privacy mechanism of that the probability of linkage is bounded by 1/fc for all fx- It means that we 
deal with an adversary who has arbitrary knowledge about the private table: The adversary might know the 
private table itself and incorrect private tables. 

We note that even if in the extreme case where the adversary knows the private table itself, Pfc-anonymity 
can be satisfied by using the randomness in the privacy mechanisms. Of course, we assume that the adversary 
knows the released table, the anonymization algorithm, and parameters used in the system in addition to 
the background knowledge. 

3.4 Defiuitiou of Pfc-auouymity 

We define our new anonymity, Pfc-anonymity and Pfc-anonymization, which is a privacy mechanism that 
always satisfies Pfc-anonymity. 

First, we define an attack by an adversary with background knowledge, which is represented as an 
estimation by the following probability, where t' is a released table, 77 is a uniformly random injective map 
from TZ to TZ', r G TZ, r' € TZ' and A{T) =T' o U. 

Pr[77(r) =r'|P' = r'] (4) 

The term TZ represents a set of individuals, and TZ' represents a set of record IDs (not necessarily explicit 
IDs. In anonymized microdata, it maybe just a location in storage.). The P’s randomness represents that 

It means that the adversary knows that the private table is ri with probability xi, T2 with X2 and so on. It is not 
a distribution of values in a specific table, but the distribution on the space of all tables. 



“an adversary has no knowledge of the linkage between individuals and the records in t'.” Taken together, 
the above probability represents the following probability from the standpoint of an adversary who saw t' . 

Pr[a person r’s record in t' is r'] 

We denote the above probability as f (/t, r', r, r'). 

Next, we define Pfc-anonymity. 

Definition 6. {Pk-anonymity) 

Let TZ, V, TV, and V' be finite sets, and 11 and A he random variables on TZ ^ TV and T ^ (TZ ^ V'), 
respectively, where T denotes the set of tables on {TZ, V) and the notation X ^ y denotes the set of all maps 
from X to y for any set X and y. Furthermore, let A denote a 6-tuple {TZ,V,TZ',V', 11, A). 

Then, for any real number k>\ and a table t' on {TZ', V), a pair {A, r') is said to satisfy Pk-anonymity 
(or to be Pk-anonymous) if and only if for any random variables T of tables on {TZ,V) and T' of tables on 
{TZ', V') such that A is a privacy mechanism from T to T', any record r gTZ of the private table T and any 
record r' G TZ' of the released table t' , the following equation is satisfied. 

Pr[i7(r) = r'|T' = r'l < y 
k 

Definition 7. {Pk-anonymization algorithms) 

Let TZ, V, TZ', V', LI, A, and A be the same as Definition 6, and let T' denote the set of all tables on {TZ', V'). 

Then, for any real number k > 1, A is said to be a Pk-anonymization if and only if {A,t') satisfies 
Pk-anonymity for any released table t' G T' such that there exists a private table r on {TZ,V) which satisfies 
Pr[Z\(T) = t' o n] 0. 

we treat only A within 6-tuple of a privacy mechanism {TZ, V, TZ', V, IT, A) = A; thus, we do not differentiate 
A and A. 

Pfc-anonymity’s direct meaning is “no one estimates which person the record came from with more than 
1/k probability.” Intuitively, it seems to be similar to “no one can narrow down a person’s record to less 
than k records,” which is an intuitive concept of fc-anonymity. This intuitive similarity can also be confirmed 
mathematically. Furthermore, as far as deterministic anonymization algorithms, such as fc-anonymization 
algorithms, are concerned, two anonymity notions can be shown to be equivalent to each other. Therefore, 
we say fc-anonymity is satisfied in a randomized table if Pfc-anonymity is satisfied in the table. 

Theorem 2. For any positive integer k, privacy mechanism A, and released table t' , the following relation 
holds if A is deterministic, i.e., for any t €T, there exists unique anonymized table t and A{t) = f. 

t' is k-anonymous {A,t') is Pk-anonymous 

This theorem represents equality of Pfc-anonymity and /c-anonymity under the consideration of deterministic 
anonymization algorithms, which are the applicable field of fc-anonymity. Therefore, Pfc-anonymity is deemed 
as an extension of fc-anonymity. 

(Proof of Theorem 2) 

This theorem is shown with the following two lemmas. 

Lemma 1. For any positive integer k, if a released table t' is k-anonymous, then {A,t') is Pk-anonymous 
for any privacy mechanism A. 

Lemma 2. For any real number t >1, positive number k such that k <t, any deterministic privacy mech¬ 
anism A, and released table t' , if {A,t') is Pt-anonymous, then t' is k-anonymous. 


9 



Roughly, Lemma 1 states that “fc =J> Pk always,” and Lemma 2 states that “Pfc fc if an anonymization 
algorithm is deterministic.” 

{Proof of Lemma 1) 

First, we use notation s-s say r' is /c-anonymous in r' if Sr'(^'(r')) > k. Then, 

fc-anonymity of t' is represented as “r' is fc-anonymous in t' for any r' € TZ' 

As mentioned in Section 3, we show Lemma 1 and Lemma 2. Note that the following equality holds by 
definition. 

A{T)=T'on (5) 

We show that an estimation probability, f (/t, r', r, r'), is equal to or less than 1/k. For any background 
knowledge /t : T —R, any r G TZ and any r' G TZ', the following equations hold. 


£{fT,T',r,r') =PT[n{r) =r'\r = t'] 

Pr[7T(r) = r' AT' = t'\ Pr[i7(r) = r' A A{T) = r' o 77] 


Pr[T' = t'] 


Pr[Z\(T) = r' o 77] 


/zi((5)/T(T)Pr[7T(r) = r' A 5{t) = r' o 77] 


t^T 


y] fA{5)fT{T)Pr[5{T) = r' o 77] 


rer 


(from Equation (5)) 


We define two propositions <7((5, r) and <7((5, r) as 


(since T, A, and 77 are independent of each other) 


t) = [There exists t: : TZ ^ TZ' such that 6{t) = r' o tt] 
S{6,t) = [(1>{S,t) and (5(r))(r) = T'{r')] 

respectively. Since 11 is a, uniformly random permutation, the following equations hold. 


n K^')! 

Pr[5(T) = r' o 77] = < ^ ^ holds) 

0 (otherwise) 


Pr[77(r) = r' A S{t) = t' o 77] 


(H.'(T'(r'))-l)! n 

m 


[0 

[ n 

I i7'Glm(r') 

' ttr'(r'(r0)|7e|! 

0 

V 


(if S holds) 
(otherwise) 


(if ^ holds) 
(otherwise) 


Therefore, the primary equation Pr[77(r) = r'\T' = t') is transformed as 
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S{5,t) 


H,,(r'(r'))|7^|! 


n ^r'ir\s')y. 

<?((5,t) 


|7^|! 


n K'^'(s'))! 

1.1, 


< 


<f(<5,T)) 




E fMS)fTil 

^((5,t) 


t;' ^lm(r') 


|7e|! 


1 1 
tt^/(r'(r')) “ k' 


(since S <P) 


(from fc-anonymity) 
□ (Lemma 1) 

{Proof of Lemma 2) 

In the proof we use and show the following contraposition. 

For any privacy mechanism A, if t' is not k-anonymous, then {A,t') is also not Pt-anonymous. 

We consider the background knowledge, /t, satisfying /t(t) = 1. Let r' G TZ' he a record that is not 
fc-anonymous in r' and that satisfies r G Tr~^(r'). 

As in the proof of Lemma 1, the following equation holds. 


£{fT,T',r,r') = 


n K^')! 

<i>(5.T) 

n ^r'{v')\ 

D'^Im(r') 

m 


E fAiS)fT{T) 


Since Z\ is deterministic and friT) = 1, we transform the above equation as follows. 






E fMS)fT{T)- 

^((5,r) 


Kr'(r'))|7^|! 




|7^|! 


|7^|! 


We assume r' is not /c-anonymous; therefore, -— . > —. 

Kr'(r')) k 


□ (Lemma 2) 


The above two lemmata immediately imply Theorem 2. 


□ 


Furthermore, fc-anonymization and Pfc-anonymization also have a similar equality. 
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Corollary 3. For any positive integer k and privacy mechanism A, if A is deterministic, the following 
holds. 

A is k-anonymization A is Pk-anonymization 

Through Theorem 2, we have seen that Pfc-anonymity is an exact mathematical extension of /c-anonymity. 
Moreover, the intuitive meaning of fc-anonymity, “no one can narrow down a person’s record to less than k 
records” is applicable from the following viewpoint. Under a privacy mechanism A and a certain released 
table r', an adversary’s estimation E{fT, t' ,r,r') is 1/k or less for any r € TZ and r' € TZ', when {A,k) is 
P/c-anonymous. Then by definition, for any k — 1 records {r'}o<i<fc-i in r', the following relation holds. 

^ifT,r',r,r'i) 

0 <i<k-l 

This relation means that when one has chosen fc — 1 records from t', there is always 1/k probability that r 
is not in these k — 1 records in t'. This precisely means that “no one can narrow down a person’s record to 
less than k records.” 


Remember that an adversary is considered as background knowledge and a distribution. In the field of 
cryptography, an adversary is often represented as an algorithm. We show that an adversary represented as 
a probabilistic algorithm M that takes inputs as {t' ,r) cannot select r’s record in a released table with a 
higher probability than 1/k. 

Proposition 1. For any Pk-anonymization A, t € T, t' £ T' such that A{T) = T' o FI, r £ TZ and 
probabilistic algorithm M that takes r and r as inputs, M do not select r' £ TZ' such that IT (r) = r' with a 
higher probability than 1/k. 


(Proof of Proposition 1) 

Let /t be the following probability function. 


Pr[r = r] 


1 if r = Tt 
0 otherwise 


Under this fx, the probability Pr[7T(r) = r'|T' = r'] is not only an adversary’s estimate, but also the true 
probability. On the other hand, it is 1/k or smaller by Pfc-anonymity; therefore, no function selects r' with 
a higher probability than 1/k, and M is only a function. 


□ 


4 Applying PA;-anonymity (and DP) to PRAM 


We apply Pfc-anonymity to PRAM. First, we show a theorem on general PRAM for calculating k. Next, we 
describe a more concrete formula on retention-replacement perturbation introduced in Section 2.3. Finally, 
combining existing result, we propose an algorithm to satisfy both P/c-anonymity and e-DP. 

We assume V = V'. The privacy mechanism A is defined along with PRAM, i.e., defined for any r £TZ 
and v' £V', as follows. 

f{A(T)){r){v') = Ax(r),v' 

We call such a privacy mechanism a PRAM mechanism. 


Theorem 3. ( Pk-anonymity on PRAM) 

A PRAM mechanism whose transition probability matrix is A is a Pk-anonymization if and only if k is 
described as follows. 


k<l + {\TZ\-l) min ^ 

u',v'€V' 


4 / 4 

-^11 


/ 4 

II.' -^1) 
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Note that this theorem shows the tight bound of k. This theorem is shown by evaluating the maximum 
probability of estimation f (/t, r', r, r') on r G 7?., r' G TZ',t' G T' and background knowledge /t : T —> R. 
The probability takes the maximum value in the following case. 

— All values in private table r happened to be retained in released table t' 

— There are only two values in r and r', one is r(r) and the other is u G V, which satisfies t(s) = v for 
any record s ^ r 

— T{r) and v shown above are different from each other in all attributes 

f 1 if 7 - = 7 -^ 

— The adversary knows all about the private table, i.e., /t(t) = { 

I 0 otherwise 

With this fact, k can be derived by substituting each parameter in estimate f (/t, r', r,r'). 

{Proof of Theorem 3) 

We show this theorem by evaluating the maximum of S{fT,T',r,r') on r G 7^, r' G G T' and 

/t : T —>■ M. Similar to the proof of Lemma 1, the following equation holds. 


S{fT,T',r,r') = Pr[77(r) = r'|r' = r'] 

_ Pr[Z\(T) = r' o 7T A 7T(r) = r'] 
“ Pt[A{T) = t' o 7T] 


/T(T)Pr[Z\(r) = t' o 7T A 7T(r) = r'] 

rer 


(from Equation (5)) 


X! fT{T)Pr[A{T) = r' o 77] 


tGT 


Next we show that fx maximizes the above estimation probability. In other words, we show which 
adversary can guess the record of a person with the highest confidence. 

Lemma 3. Let ]&"■*■ be the set of non-zero n-dim vectors whose elements are non-negative real numbers. 
Then for any vector a,b G the maximum of 


def b • X ^ ^2i<n 


9^^) = 




on a variable x on R"’*' is max—, and x satisfies 

i<n Oi 

bi bi 

for any i < n such that — A max —,Xi = 0 . 

Oi i<n Oi 

{proof of Lemma 3) 

Since g{x) is invariant on a scalar multiplication of x, it is sufficient to find the maximum in some Y C R"’*' 
such that there exist a G R and y G Y that satisfy ay = x, for x G R”"*". By taking F as a plane, we can 
find that the maximum exists because it is a bounded closed set. 

Next we have that 

n dg{x) 

Xi = 0 or — -= 0 

OXi 

holds for each element Xi oi x G R""*" that gives maximum g{x). Otherwise, escalating Xi should increase the 
value of g{x), and contradicts that g{x) is the maximum. Because of this fact and also because that x is not 


a zero vector, there must exist at least one i such that 


dg{x ) 

dxi 


= 0 . 
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Finally, this partial differential is found to be 

dg{x) (a ■ x)bi — {b ■ x)ai 
dxi {a • 


then 


dgix) „ _ , ^ h 


Qg 

holds. Therefore, i, which satisfies —- = 0 must be i giving maximum —; all other elements are 0, and 

axi Qi 

the maximum of g{x) is max—. 

i<n m 


□ (Lemma 3) 


From the above lemma, when £{fT,T',r,r') takes the maximum, makes the following formula maxi¬ 
mum, 

Pr[Z\(T) = t' o n A n{r) = r')] 


Pr[Z\(r) = t' o n] 

and the maximum of Formula (6) is equal to that of £{fT, t', r, r'). 

Since 7T is a uniformly random permutation. Formula (6) is transformed as follows. 

Pr[Z\(T) = t' o tt] 


( 6 ) 


Formula(6) = 


\n, 


■K(r)=r 


1 


\n 


^Pr[Z\(T) = t' ' 


^ Ft[A{t) = t' o tt] 

7T{r)—r' Tr{r)—r'sG'7Z 

^Pr[Zl(r) =r'o7 r] ^ Pr[(Z\(T))(s) = r'(7r(s))] 

(since A is independent from each record) 


IT seTZ 


Let a matrix be 


1^'Pr[(Z\(r))(s)=r'(50] 


for any s G TZ, s' € TZ'. Then, the above formula is represented as follows. 




(*) 


F{A'') ^(A=r'sG-R 




•TT sGTI 


We would rather find the minimum of the reciprocal than the maximum of ) itself. In the case of 

\TZ\ > 2, the reciprocal is transformed as follows. 




IT sGK 


F(Ar^n ^ yia’-:;. 


TT{r)—r'sG'R, 
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(s) 


t^r 

t'^r' 


T(t)=r' s^t,r 
-r(r)=t' 


TT'{r)=r's^r 


K-;: Y. n-^i::;., 

TT(r)=r's^r 

TT .t,t' .t,t' .t,t' TT ,t.t' 

^ty ^r,t' 11 s,7i-(s) 11 s,7r(s) 


= 1 + 


t^r 

t'^r' 


7r(t) = r' S^t,r 
K(r) = t' 


<■;: E n-4«: 


= 1 + 


t:?ir 
t' ^r' 


rr(r)=r' S^t,r 
7r(t) = t' 


(s) 


E n-4«; 


(s) 


■w{r)=r’s^r 


■w(r)=r’s^r 


TT 

11 5,^(5 

E lT^r'jr.r' s^r 

^ty ^r,yt) Z^ 


y y ty rMt) jT^ry 


= 1 + 


t^r 


7r(r)=r's5^r 

We show the following lemma. 


AT,T' ^ ^ A 

7r(r)=r' ^ ^ Tr(r)=r' t^r t,TT(t) s^r 


(s) 


(r)=r' S 5 ^r 


Lemma 4. Let anrf /li fee gi,hi : ^ R. for any index i €2, where I is a set of indiees. If some x € 

and 2 : € M satisfy -4-^ = min 4 ^ = z for any i €l, then the following equation is satisfied. 
gi{x) x'mrg.(^x') 


min 

x'eM.r 


Y^h,{x') 

iei 


iei 


'^K{x) 

iei 

iei 


{proof of Lemma 4) 

From the assumption of the lemma, hi{x') > zgi{x') hold for alH € I and any x' € R. Therefore, 

^h{x') 

^9^{x') 

iex 


then 


holds. 


min 

k'GR" 


'^hi{x') 

iGl 


'^9^{x') 

iei 


'^h,{x) 

iei 

iei 


Let hTr{A'^''^ ) and ( 77 r(^^’^ ) be 


t#r s^r 


s^r 


is) 


□ (Lemma 4) 
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for any n : 7Z ^ IZ'. Thanks to Lemma 4, it is sufficient to consider only. Because it is transformed 

into -7^^ ———IiZilfI it takes the minimum for any tt \7Z ^ IV when r and are as follows. 

r,r' t^r i,7r(t) 


There exists u € V and v' £ V' such that 


At( 


= mm 


A , A 

-^u,v' -^v^u' 




r(s) = V for any 


s ^ r and t\s') = for any s' ^ r'. 

Since k is to be the reciprocal of the maximum of F{A'^^'^ ), A: is found to be the following value. 

A ,4 

■^11..'}}' ■^1) .'ll.' 


k = 1 + {\R\ — 1) min 


v'^V' 


It is easy to confirm that the above equation also holds when \R\ = 1. In this case, since only one tt exists 
(denoted as tt), k equals I as follows. 


siKt;.) 


k = 


TV sGTZ 


f(v-) 

'K{r)—r's^'R. 


(s) 

s^1Z . n^i -I \ • Au v'-^v.u' 

7 — = 1 = 1 + [\K\ — 1) mm 


Au.u'A 


,v'GV' 


u' ^v.v' 


sen 


(since \TZ\ = 1) 

□ 


We describe the multi-attribute version of Theorem 3. 

Corollary 4. A PRAM mechanism whose transition probability matrices are Aa for each attribute a is a 
Pk-anonymization when k is described as follows, 


/c = l + (|7^|-l)llARa 

aGA 


where ARa is 

A . {-^aju.v'{-^ajv.u' 

~ (A ) ,(A ) C’ 

The following corollary is applicable to retention-replacement perturbation. 

Corollary 5. Retention-replacement perturbation whose retention probabilities are pa for each attribute a € 
A, is a Pk-anonymization when k is described as follows, 

k = i + {\n\-i)Y[ARa 

a€.A 


where AR^ is 

(i + dVal-lVa) • 

Using Theorem 3, k is easily calculated with the record count \R\ and transition probability matrix A. 
Regarding retention-replacement perturbation, A is determined independently with the instance of private 
data, k is calculated with the record count |7?.| and the numbers of attribute values |Vo| and retention 
probabilities pa only, for each attribute a. 
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Conversely, pa are also calculated in retention-replacement perturbation. By letting all pa be the same p 
over all attributes, the equation is transformed as follows. 

Since k monotonically decreases on 0 < p < 1, p is easily and uniquely solved using, for example, the 
bisection method for any k, \R\ and |Va| (Algorithm 1). Note that k is allowed to be a real number, for 
example, k = 1.5. 


Algorithm 1 determining p in retention-replacement perturbation from k 
input: k G > 1), \TZ\ G N, |Va| for each attribute 
output: retention probability p 
1: Set po = 1/2. 

2: Run the bisection method with p’s initial value po with respect to k using Equation (7) and output the converged 

P- 


For example, to ensuring PlOO-anonymity on 100,000 records of data, p is calculated as roughly 0.303, 
where there are three attributes, sex, age from 20’s to 60’s, and 10-leveled annual income. 

When the record count is uncertain since the data are to be collected thereafter, it is sufficient to use 
the expected record count. Even when the record count does not reach the expected value, P/c-anonymity 
is still satisfied for the following reason. When each record in table t' is anonymous due to an anonymous 
communication channel, it can be said that only a part of table t' is visible in the state in which t' is being 
collected. An estimation in such a situation is equivalent to that from the algorithm that ignores the absent 
records. From Proposition 1, the algorithm cannot derive £1(/t, t', r, r') > 1/fc if it is correct. 

4.1 DP on PRAM in Addition to Pk-Anonymity 

Regarding retention-replacement perturbation, we can derive Algorithm 2 that determines the parameter in 
order to satisfy e-DP from Corollary 2. 


Algorithm 2 determining p from e 
input: e > 0 and |Va| for each attribute a 
output: retention probability p 
1: Set po = 1/2. 

2: Run the bisection method with p’s initial value po with respect to e using Equation (3), and output the converged 
P- 


Combining Algorithm 2 with Algorithm 1, we have Algorithm 3 that determines the parameter in order 
to satisfy both P/c-anonymity and e-DP. 

5 Experimental results 

From the aspect of utility, we show that randomized data-bases protected by Pfc-anonymity are available for 
data analyses. We experimented with cross-tabulations (or, contingency tables) using Pfc-anonymity. 

In the experiments discussed below, the dataset was randomized by retention-replacement perturbation, 
and cross tabulations were calculated using the reconstruction method [3]. The target dataset was the US 
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Algorithm 3 determining p from k and s 

input: k € > 1), £ > 0, \Tl\ € N, |Vo| for each attribute 

output: retention probability p 

1: Run Algorithm 1 and Algorithm 2 and let the results be pk and pe, respectively. 
2: output min(pfc,Pe). 


census dataset in the UCI Machine Learning Repository [5], which has 2,458, 285 records. Out of this dataset, 
we extracted and used 7 attributes, as shown on Table 1. Several attributes were rounded because they had 
too many attribute values for cross tabulation. 


Table 1. attributes and number of attribute values 


Sex 

2 

Age[*] 

18 

Total Pers. Inc. Signed[*] 

12 

Worked Last Yr. 1989 

3 

Worked Last Week 

3 

Ed. Attainment 

18 

Travel Time to Work[*] 

20 


(Marked([*]) attributes are rounded.) 


Figure 2 shows Ll-norm errors and e by varying the record count with fixed k = 2 and four attributes, 
Sex, Age, Total Pers. Inc. Signed, and Worked Last Yr. 1989. The Ll-norm is a normalized distance between 
original cross-tabulated aggregates and reconstructed aggregates, given as the following d, where each Xy 
and Hy are the counted aggregates of the private table and the reconstructed aggregates corresponding to 
r: G V, respectively. 

'y ' \xv — yv\ 

, vGV 

\n\ 

From the graph, it seems that errors become smaller as the record count increases. When only 245 records 
were used, errors were quite high. However, there were almost no errors when all 2,458,285 records were 
used. This is due to two reasons. First, in the reconstruction method, a large number of records generally 
results in accurate analyses results in a fixed retention probability. Second, since many records also provide 
high k on P/c-anonymity by the same p according to Theorem 3, one can set a relatively high p. Regarding 
£-DP, £ increases as the record count increases. It is because retention probability p monotonically increases 
with the increase of the record count by Equation (7), when k is fixed; 

Figure 3 shows Ll-norm errors and e by varying the number of attributes with fixed fc = 2, using all the 
records of the dataset. Attributes have been added in the same order as in Table 1. Figure 4 shows Ll-norm 
errors and e by varying k with fixed attributes. All the records from the dataset were used and attributes 
were the same four attributes as in Figure 2. From these graphs, it seems that errors become larger as the 
number of attributes or k increases. However, the increment is quite small. 


6 Conclusions 

In the field of anonymized microdata release, we mainly presented the following two theories. We first pro¬ 
posed an anonymity notion, P/c-anonymity, which is an extension of /c-anonymity to randomized microdata, 
and its intuitive meaning is “no one estimates which person the record came from with more than 1/k 
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Fig. 2. Reconstruction errors and e by varying number of records 
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k, Reconstruction Error and e 
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Fig. 4. Reconstruction errors and e by varying k 


probability.” We then applied Pfc-anonymity to PRAM. PRAM is known to satisfy e-DP; thus, it achieves 
fc-anonymous and e-differentially private microdata release. 

The contributions of the paper are: 

— an anonymity notion called P/c-anonymity, 

— proofs that Pfc-anonymity is an exact mathematical extension of fc-anonymity, 

— a formula for calculating fc on PRAM, 

— algorithms for determining the parameter of the retention-replacement perturbation according to fc and 

e, 

— experimental results to empirically analyze the trade-off relation between utility and privacy/anonymity 
using a real dataset. 

Theoretical analyses and further experiments in real applications regarding utility are future work. 
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A Proof of Theorem 1 


Let ti,T2 S T be arbitrary private tables differing by one record (i.e., one and only one r G TZ exists and 
satisfies [ri(r) ^ T2(r)] and Ti| 7 ^\|r} = 'r2|7j\{r}) and let t' € T' be an arbitrary randomized table. 

The proposition we should show is the following inequality. 


Pr[Z\(Ti) = T'] 

max — 7 ——— -T 

■ri,T2,T'Pr[Z\(T2) = t'] 


< exp(£) 


A 

max — 
A 

t.'ev' 




v,,v' 


The left-hand side of the above inequality is transformed as follows. 


21 



Pr[Z\(Ti) = t'] 

Ti.rt,T'Pr[Z\(T2) = r'] 

7r:7 ^—sGT?. 

Y1 T^nP’'[(^('^ 2 ))(s) = r'(7r(s))] 
ttiT^— ¥1^ 


= max 

'ri,T2,T' 


= max 

Tl ,r2 ,T' 


E 

tt-.TZ^'R.s^TZ 


Ti(s),T'(7r(s)) 


E IT^'^2(s),T'(7r(s)) 

ttiT^— ¥'T^S^'T^ 

^ ' (J_ J_^Ti(s),r'(7r(s)))24T-i(r),r'(7r(r)) 

TT-.'JZ^'JZ s^r 

Tl,'r2,T ^ ^ (J_ J_24-r2(s),r'(7r(s)))24-T-2(r),r'(7r(r)) 

7r:TZ^lZ s^r 


= max 


( 8 ) 


Now, let Vi, V 2 and f be Ti(r), T 2 (r) and Ti| 7 ?,\{r}(= T 2 | 7 ?,\{r})) respectively. Note that these three variables 
determine ti and T 2 uniquely. Furthermore, let aTr^v 2 ,T>) and a;,r,f,T' denote AT- 2 (,r),T'( 7 r(r))) ^Ti(r),T'(ir(r)) 

and 

s))), respectively. Using these representations, we denote the above formula 

s^r s^r 

with the following formula. 


( 8 ) 


max 

'Tl ■,T2 ,T 


^ ^ ^7r,i;i ,r^^7r,T,T' 

'K-.TZ^'R. 


^ ^ (^7T,V2,t'^7T,T, 

-k-.TZ^TZ 

^ ^ ^7r,ni, 


r ‘^TT.r.T 


max 

f,ViyV2,T 


n:lZ^'7Z 


^ ^ 0'7T,V2,T'XTr^T,T' 


(9) 


By fixing vi^ V 2 and r^, we can apply Lemma 3 to remove r and x-j^^r,T' from the above maximum. 


(9) = max —^^— 

'U1,'U2iT ^1)2 

^Ti(r),T'{'K{r)) 

< max ——< max ■ 
vi,V2,r',-KAr^(^r),r'{^ir)) 


( 10 ) 

□ 
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